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ABSTRACT 

High  assurance  systems  used  in  avionics,  medical  implants, 
and  cryptographic  devices  often  rely  on  a  small  trusted  base 
of  hardware  and  software  to  manage  the  rest  of  the  system. 
Crafting  the  core  of  such  a  system  in  a  way  that  achieves 
flexibility,  security,  and  performance  requires  a  careful  bal¬ 
ancing  act.  Simple  static  primitives  with  hard  partitions 
of  space  and  time  are  easier  to  analyze  formally,  but  strict 
approaches  to  the  problem  at  the  hardware  level  have  been 
extremely  restrictive,  failing  to  allow  even  the  simplest  of 
dynamic  behaviors  to  be  expressed. 

Our  approach  to  this  problem  is  to  construct  a  minimal 
but  configurable  architectural  skeleton.  This  skeleton  cou¬ 
ples  a  critical  slice  of  the  low  level  hardware  implementa¬ 
tion  with  a  microkernel  in  a  way  that  allows  information 
flow  properties  of  the  entire  construction  to  be  statically 
verified  all  the  way  down  to  its  gate-level  implementation. 
This  strict  structure  is  then  made  usable  by  a  runtime  sys¬ 
tem  that  delivers  more  traditional  services  (e.g.  commu¬ 
nication  interfaces  and  long-living  contexts)  in  a  way  that 
is  decoupled  from  the  information  flow  properties  of  the 
skeleton.  To  test  the  viability  of  this  approach  we  design, 
test,  and  statically  verify  the  information-flow  security  of 
a  hardware/software  system  complete  with  support  for  un¬ 
bounded  operation,  inter-process  communication,  pipelined 
operation,  and  I/O  with  traditional  devices.  The  resulting 
system  is  provably  sound  even  when  adversaries  are  allowed 
to  execute  arbitrary  code  on  the  machine,  yet  is  flexible 
enough  to  allow  caching,  pipelining,  and  other  common  case 
optimizations. 

Categories  and  Subject  Descriptors 

C.O  [Computer  Systems  Organization]:  General — Hard¬ 
ware/  software  interfaces;  C.3  [Special  Purpose  And  Ap- 
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plication  Based  Systems]:  [Real-time  and  Embedded  Sys¬ 
tems] 

General  Terms 

Design,  Security,  Reliability,  Verification 

Keywords 

High  Assurance  Systems,  Gate  Level  Information  Flow  Track¬ 
ing,  Non-interference 


1.  INTRODUCTION 

Systems  requiring  the  highest  levels  of  trust  are  often  de¬ 
signed  and  built  using  waterfall  principles,  modeled  at  a 
high-level  in  a  theorem  proving  system,  coaxed  through  said 
theorem  proving  system  by  hand,  documented  to  an  tremen¬ 
dous  degree  throughout  the  entire  process,  and  then  evalu¬ 
ated  by  a  trusted  third  party  or  evaluation  board  [2].  It 
is  estimated  that  the  entire  process  costs  over  $10,000  per 
line  of  code  [4]  and  takes  over  10  years  to  complete  [3].  In 
the  end,  there  is  more  evaluation  of  the  development  process 
than  the  final  end  artifact,  and  formal  properties  are  shown 
to  hold  only  for  hand-written  high-level  models  of  the  sys¬ 
tem  rather  than  for  the  actual  implementation  [15].  The 
ultimate  goal  of  our  work  is  to  create  full  system  implemen¬ 
tations  (including  both  hardware  and  software)  with  security 
properties  that  can  be  directly  measured  and  verified. 

In  particular,  we  are  interested  in  verifying  information 
flow  security  to  capture  security  requirements  such  as  con¬ 
fidentiality,  integrity,  and  even  real-time  guarantees.  The 
problem  with  verifying  information  flows  through  high-level 
specifications  (hand-written  or  otherwise)  is  that  these  mod¬ 
els  often  ignore  predictors,  caches,  buffers,  timing  variations, 
and  undocumented/unspecified  instruction  behaviors  that 
are  not  part  of  the  ISA-level  specification  of  the  architec¬ 
ture.  These  structures  greatly  complicate  accurate  informa¬ 
tion  flow  evaluations  because  they  can  be  used  to  infer  the 
values  of  secret  keys  [8,  19],  to  subvert  documented  protec¬ 
tion  mechanisms  [28],  and  to  covertly  transmit  information 
between  compartments  [23]. 

Recent  work  has  shown  how  information  flows  can  be 
strictly  managed  at  the  architecture  level  in  a  way  that  uni¬ 
fies  explicit,  timing,  and  storage  information  flows  [30,  31]. 
This  allows  many  of  the  security  properties  of  the  resulting 
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hardware/software  system  to  be  clearly  and  unambiguously 
analyzed.  However,  the  current  incarnations  of  these  ap¬ 
proaches  have  thus  far  come  at  the  cost  of  drastically  reduced 
flexibility.  For  example,  an  Execution  Lease  [30]  allows  for 
a  region  of  execution  to  be  tightly  quarantined  both  in  time 
and  space  (which  can  then  be  verified  with  gate-level  infor¬ 
mation  flow  tracking) ,  but  requires  that  each  and  every  lease 
have  a  hard  time  bound.  After  the  bound  expires,  control 
is  revoked  from  the  leasee,  meaning  unbounded  or  timing 
variable  operations  do  not  fit  cleanly  into  the  model. 

These  restrictions  present  numerous  challenges  with  re¬ 
gards  to  the  creation  of  real  systems.  The  hard  nature  of 
an  Execution  Lease  programming  model  does  not  naturally 
support  performance  enhancing  micro-architectural  features 
such  as  caches,  pipelining,  branch  predictors,  or  TLBs  be¬ 
cause  of  the  timing  variabilities  they  introduce;  it  lacks  suffi¬ 
cient  support  for  software  behaviors  that  cannot  be  bounded 
and  divided  into  fixed  regular  sized  chunks  of  work,  it  does 
not  provide  any  easily  verifiable  mechanisms  by  which  dif¬ 
ferent  trust  domains  can  communicate  safely;  and  it  makes 
handling  the  inherently  dynamic  nature  of  I/O  very  diffi¬ 
cult.  Furthermore,  information  flow  security  is  provided  us¬ 
ing  additional  analysis  logic  that  adds  substantial  area-delay 
overheads  to  the  deployed  system.  In  this  paper,  we  present 
our  experiences  building  a  full  system  that  removes  all  of 
the  above  restrictions  yet  is  still  verifiably  information  flow 
secure,  i.e.  conforms  to  a  specified  information  flow  policy. 

Our  method  for  overcoming  this  challenge  is  two- fold; 
first,  we  create  a  thin  skeleton  of  hardware,  that  when  con¬ 
figured  and  operated  by  a  small  piece  of  software,  describes 
a  minimal  functionality  with  which  the  information  flow  of 
the  rest  of  the  machine  can  be  governed.  In  essence,  this 
minimal  slice  of  the  hardware  is  to  the  entire  pro¬ 
cessor  core,  as  a  microkernel  is  to  a  full  operating 
system.  This  strict  structure  is  then  enhanced  by  a  runtime 
system  that  delivers  more  dynamic  or  non-information-flow- 
critical  services  (e.g.  communication  interfaces  and  context 
swapping) .  Because  these  operations  are  decoupled  from  the 
information  flow  properties  of  the  skeleton,  they  do  not  add 
complexity  to  the  verification  of  the  system  as  a  whole. 

To  make  this  idea  more  concrete,  consider  the  process  by 
which  a  context  is  saved  and  restored.  To  be  a  correct  con¬ 
text  switch,  the  registers  and  PC  (along  with  other  things) 
need  to  be  saved  off  to  a  region  of  memory,  and  then  restored 
when  the  process  is  scheduled  again.  However,  to  verify  the 
information  flow  properties  of  such  a  system,  we  need  only 
to  verify  that  the  context  is  saved  and  restored  in  a  way  that 
does  not  leak  data  and  violate  policy. 

To  demonstrate  these  principles  we  have  created  a  syn- 
thesizable  full-system  prototype,  complete  with  a  pipelined 
CPU,  a  micro- kernel  that  enables  isolation  and  communi¬ 
cation  by  explicitly  controlling  all  micro-architectural  state, 
and  an  I/O  subsystem  that  allows  off-the-shelf  I^C  devices 
to  be  connected  to  a  single  shared  bus.  Our  system  can 
provide  caches,  pipelines,  and  support  for  the  micro-kernel 
in  only  l/4th  the  area  and  with  double  the  clock  frequency 
as  more  restrictive  prior  work.  Finally,  for  a  system  of  size 
50K  logic  gates  (approximately)  and  with  only  3264b  out 
of  133kB  state  specified  concretely,  we  can  statically  verify 
that  the  entire  hardware-software  stack  conforms  to  a  spec¬ 
ified  information  flow  policy  all  the  way  down  to  its  gate 
level  implementation. 


Figure  1:  The  proposed  architectural  skeleton 
(shaded  black  in  the  CPU)  that  allows  explicit  soft¬ 
ware  control  over  the  entire  processor  state.  The 
processor  includes  dynamic  micro-architectural  fea¬ 
tures  such  as  caches  and  pipelining.  This  hardware 
skeleton  is  used  by  a  separation  kernel  to  manage  ex¬ 
ecution  time,  memory,  and  I/O  devices  among  mul¬ 
tiple  security  partitions.  We  also  introduce  trusted 
adapters  for  secure  I/O.  Here,  an  I^C  master  con¬ 
troller  on  the  CPU  manages  a  shared  bus  among 
off-the-shelf  I^C  devices  with  different  trust  levels. 
In  the  end,  we  verify  that  the  hardware  and  kernel 
together  enforce  a  desired  information  flow  policy 
such  as  non-interference. 


2.  INFORMATION  FLOWS  AND  RELATED 
WORK 

Systems  required  to  provide  critical  services  must  be  engi¬ 
neered  and  evaluated  to  a  very  high  level  of  assurance.  For 
example,  bus  interfaces  on  aircraft  multiplex  critical  aircraft 
control  data  with  data  of  peripheral  importance  (such  as 
passenger  internet)  on  the  same  shared  physical  bus,  and 
must  guarantee  that  the  integrity  of  the  critical  data  is  pre¬ 
served  under  all  circumstances  [13].  Similarly,  a  program 
that  uses  the  private  key  of  a  bank  should  be  demonstrably 
confidential  from  other  unclassified  programs.  While  strict 
adherence  to  information  flow  policies  alone  is  not  sufficient 
to  ensure  the  trustworthiness  of  a  system,  information  flow 
policies  are  certainly  necessary.  For  example,  in  the  case  of 
cryptographic  operations,  while  there  is  clearly  information 
flowing  from  the  key  to  the  encrypted  data,  we  also  need 
to  ensure  that  the  key  is  not  flowing  anywhere  else  (e.g., 
through  a  timing  channel) . 

At  the  highest  level  there  are  two  classes  of  approach  to 
this  problem:  best-effort  and  strict.  We  define  a  best-effort 
approach  as  one  that  attempts  to  manage  these  information 
flows  by  closing  known  existing  holes,  managing  uncloseable 
channels  through  statistical  techniques  such  as  clock  fuzzing, 
and  structuring  hardware  and  software  as  a  whole  to  make 
the  job  of  the  adversary  as  difficult  as  possible.  While  this  is 
more  than  sufficient  in  many  scenarios,  the  most  that  a  best- 
effort  approach  can  hope  to  achieve  is  a  demonstration  that, 
subject  to  the  threat  model,  no  known  attacks  are  feasible. 
A  strict  approach,  in  contrast,  carries  a  higher  burden  for 
proof  -  it  should  be  able  to  show  that,  subject  to  the  threat 


model,  no  attack  (known  or  unknown)  is  possible.  The  point 
of  this  paper  is  to  demonstrate  that,  while  this  added  burden 
is  still  great,  it  need  not  preclude  many  of  the  performance 
and  productivity  enhancing  features  we  take  for  granted  in 
more  dynamic  systems. 

Information  Flow  Policies:  In  this  work  we  target  se¬ 
curity  properties  such  as  confidentiality  and  integrity  (where 
real-time  performance  guarantees  are  part  of  timing-integrity) 
Both  these  properties  can  be  modeled  with  an  information 
flow  control  lattice  {C,  C),  where  £  is  the  set  of  security 
labels  and  C  is  the  partial  order  indicating  relative  secrecy 
or  integrity  of  the  labels  [12].  A  simple  lattice  to  represent 
integrity  is  Trusted  C  Untrusted;  its  dual  lattice  for  confi¬ 
dentiality  is  Unclassified  C  Secret.  Information  flow  policies 
are  then  specified  as  a  mapping  from  security  labels  to  each 
input  and  output  in  the  system,  and  our  goal  is  to  verify 
whether  it  is  possible  under  any  scenario  for  this  policy  to 
be  violated  (e.g.,  untrusted  data  never  flows  out  of  a  trusted 
output). 

Expressing  information  flow  policies  using  a  label  lattice 
has  a  few  noteworthy  characteristics.  The  policy  of  non¬ 
interference  [14]  allows  for  information  to  flow  up  in  the 
security  lattice,  i.e.  trusted  data  can  affect  untrusted  out¬ 
puts  (for  integrity)  and  unclassified  information  can  flow  to 
secret  outputs  (for  confidentiality),  and  thus  is  represented 
by  a  totally  ordered  lattice.  Complete  isolation,  on  the  other 
hand,  is  represented  by  a  lattice  comprising  of  mutually  un¬ 
ordered  labels.  Further,  in  a  real  system,  expressing  security 
as  a  lattice  of  labels  requires  that  the  kernel  have  the  lowest 
label,  e.g.  trusted  or  unclassified.  Thus  for  integrity,  the 
kernel  parameters  such  as  the  schedule  of  security  partitions 
or  the  partition  boundaries  themselves  are  trusted  and  can¬ 
not  be  tampered  by  untrusted  programs.  For  confidentiality, 
kernel  parameters  are  not  secret  and  learning  these  values  is 
not  a  security  violation.  To  ensure  security,  the  kernel  has 
to  ensure  that  secrets  or  untrusted  values  (i.e.  high  labels 
in  either  lattice)  do  not  leak  to  outputs  (including  memory 
addresses)  that  have  low  labels. 

Throughout  this  paper  we  use  trusted  and  untrusted  la¬ 
bels  to  enforce  a  policy  of  non-interference.  However,  the 
discussion  applies  to  labels  for  secrecy  and  to  labels  in  a 
general  lattice. 

Threat  Model:  We  treat  the  entire  system,  including 
the  processor  hardware,  micro- kernel,  I/O  controllers,  and 
all  the  user-level  programs,  as  a  single  monolithic  unit  for 
the  purposes  of  verification.  This  unit  has  some  known  state 
bits  (that  are  initialized  with  code  and  data  for  the  micro¬ 
kernel  and  I/O  master  controllers),  some  unknown  state  bits 
(such  as  unspecified  software  components  or  unknown  intital 
conditions),  and  a  set  of  external  input  and  output  ports. 
The  input  and  output  ports  are  assumed  to  have  a  set  of 
security  labels,  and  an  information  flow  policy  is  a  specified 
as  a  lattice  over  those  labels. 

The  attacker  is  assumed  to  have  complete  control  over  1) 
all  untrusted  inputs  to  the  device  and  2)  the  entire  set  of  bits 
which  are  unknown  at  analysis  time.  With  our  technique, 
we  can  statically  verify  that  the  type  of  data  flowing  to  any 
output  port  conforms  to  the  information  flow  policy  for  the 
label  of  that  port  (e.g.  no  untrusted  information  contami¬ 
nating  a  trusted  port,  and  no  secret  information  leaking  to 
a  non-secret  port).  This  threat  model  includes  timing  chan¬ 
nels,  implicit  flows,  storage  channels,  and  any  other  digital 
forms  of  information  flow,  but  does  not  include  the  use  of 
physical  phenomena  such  as  EM  emission  or  power  draw.  A 


system  is  said  to  be  strictly  enforcing  a  policy  if  it  can  be 
shown  that  the  policy  can  never  be  violated  regardless  of 
the  actions  of  the  attacker  subject  to  this  model. 

Formal  Methods  for  Information  Flow  Security: 
While  information  flow  tracking  has  been  proposed  at  many 
levels  of  the  computing  hierarchy,  from  virtual  machines  [16], 
high-level  languages  [26]  and  compilers  [34]  to  binary  anal¬ 
ysis  tools  [5]  and  even  hardware-assisted  information  flow 
tracking  systems  [29,  11,  25[,  formal  approaches  for  the  same 
have  been  confined  to  language- level  and  operating  system- 
level  proposals.  Approaches  that  operate  at  the  language 
level  can  even  track  implicit  flows  due  to  branches  and  loops 
that  introduce  non-determinism  into  a  program  execution. 
Since  code  that  is  never  executed  can  leak  information  (by 
the  absence  of  its  execution),  some  secure  languages  elimi¬ 
nate  non-deterministic  behavior  from  the  program  code  (ei¬ 
ther  entirely  or  based  on  confidential  or  untrusted  condition¬ 
als  [26[).  At  the  OS  level.  Flume  [22]  has  even  been  shown 
to  be  information  flow  secure  through  abstractions  such  as 
processes,  pipes,  file  systems  etc,  while  seL4  [18]  uses  au¬ 
tomated  theorem  proving  to  verify  safety  properties  of  the 
kernel. 

None  of  the  above  approaches  account  for  covert  flows 
through  architectural  state  hidden  beneath  the  hardware- 
software  interface  (i.e.,  the  processor’s  instruction-set  archi¬ 
tecture),  and  timing  channels  created  through  shared  archi¬ 
tectural  resources  like  on-chip  buses,  caches,  branch  predic¬ 
tors,  and  even  functional  units.  Such  covert  channels  have 
received  isolated  attention,  for  example  methods  to  build  se¬ 
cure  caches  [33]  or  branch  predictors  [6],  but  general  design 
methods  and  tools  are  lacking.  The  techniques  we  propose 
are  complementary  to  higher  level  approaches  and  would 
likely  work  best  when  informed  by  the  rich  semantic  infor¬ 
mation  available  at  the  language  level. 

Information  Flow  Security  at  the  Gate  Level:  While 
the  approaches  discussed  above  are  very  valuable,  discov¬ 
ering  many  of  the  most  subtle  and  exploitable  information 
channels  requires  that  we  analyze  designs  at  the  level  of  logic 
gates,  where  the  behavior  and  timing  of  the  machine  is  actu¬ 
ally  well  defined.  Analyzing  higher-level  specifications  can¬ 
not  provide  a  strong  guarantee  that  the  full  system  imple¬ 
mentation  adheres  to  desired  access  control  and  information 
flow  policies.  For  this  reason,  we  build  upon  recent  work  on 
Gate-Level  Information  Flow  Tracking  (GLIFT)  [31].  Based 
on  the  observation  that  all  digital  information  flows  are  a 
function  of  logical  information  flows  through  the  gates  and 
wires  in  a  circuit,  GLIFT  shows  that  a  gate-level  descrip¬ 
tion  of  a  processor  can  be  automatically  augmented  with 
shadow  logic-gates  that  dynamically  track  the  flow  of  in¬ 
formation  through  the  processor  and  can  identify  informa¬ 
tion  leaks  through  explicit,  implicit,  and  even  timing  chan¬ 
nels.  Execution  Leases  [30]  is  an  architecture  that  builds 
upon  GLIFT  and  allows  programmers  to  construct  a  stack 
of  nested,  space-time  sandboxes.  This  stack  of  memory  and 
execution-time  bounds  can  be  shown  to  be  information  leak 
free  at  the  gate  level,  and  having  the  benefit  of  knowing  the 
exact  signal  values  at  run-time,  can  provide  functionality 
such  as  secure  pre-emption  of  untrusted  programs  based  on 
trusted  interrupts  at  arbitrary  times. 

One  downside  of  a  dynamic  approach  is  that  shadowing 
every  gate  in  the  circuit  adds  a  considerable  hardware  over¬ 
head  over  the  base  logic  (up  to  3X  [30]).  Another  significant 
downside  is  that  dynamic  techniques  can  only  identify  in¬ 
formation  leaks  on  specific  executions  and  require  the  entire 


system  to  be  fully  specified.  High  assurance  systems  often 
require  static  guarantees  that  a  hardware/software  system 
conforms  to  a  specihc  information  flow  policy  even  when  cer¬ 
tain  components  are  neither  known  a  priori  nor  are  trustwor¬ 
thy.  In  contrast  to  dynamic  tracking  methods,  our  technique 
allows  properties  of  gate-level  descriptions  of  systems  to  be 
verified  statically  even  when  large  parts  of  the  system  such 
as  user-level  programs  and  peripheral  devices  are  not  known. 
Finally,  the  Execution  Lease  CPU  provides  a  very  restrictive 
programming  model  and  implementation  that  does  not  sup¬ 
port  full  systems  efficiently.  This  paper  addresses  the  above 
shortcomings  by  constructing  a  full  system  comparable  to 
extant  high  assurance  systems,  a  tool  for  static  gate-level 
information  flow  analysis,  and  using  the  tool  to  verify  a  spe¬ 
cihc  instance  of  the  system. 

3.  A  SECURE  ARCHITECTURAL 
SKELETON 

Our  architectural  skeleton,  working  in  conjunction  with 
the  micro-kernel,  must  deliver  each  of  the  following  capabil¬ 
ities  in  a  way  that  can  be  verihed  to  be  side-channel  free. 

f.  Verifiable  Common  Case  Optimizations  in  the 
CPU:  Techniques  such  as  caches  and  pipelining  are 
taken  for  granted  in  the  non  high-assurance  systems 
but  pipeline  stalls  and  cache  evictions  can  easily  intro¬ 
duce  side-channels.  While  countermeasures  for  these 
side-channels  exist,  we  must  be  able  to  formally  prove 
their  absence. 

2.  Verifiable  Context  Switches,  Scheduling,  and 
Communication  in  the  kernel:  Because  timing  chan¬ 
nels  are  part  of  our  threat  model,  the  micro-kernel, 
working  in  conjunction  with  the  skeleton,  must  have  a 
way  to  bound  the  behavior  of  a  software  partition.  Fur¬ 
thermore,  it  must  have  a  way  to  save  and  restore  pro¬ 
cess  contexts  without  leaking  information  about  those 
contexts,  to  schedule  these  processes  or  partitions  at 
arbitrarily  fine  granularities,  and  to  allow  inter-partition 
communication  in  a  tightly-controlled  manner. 

3.  Verifiable  I/O:  We  must  be  able  to  construct  a  sys¬ 
tem  that  is  able  to  communicate  with  the  outside  world. 
In  particular  we  must  allow  software  partitions  mea¬ 
sured  access  to  I/O,  and  this  access  must  be  able  to  ex¬ 
ploit  simple  off-the-shelf  I/O  protocols.  While  both  au¬ 
thentication  and  physical  attacks  are  beyond  the  scope 
of  this  paper,  we  must  ensure  that,  if  necessary,  infor¬ 
mation  flow  can  be  limited  to  a  subset  of  the  parties 
connected  on  the  I/O  network. 

4.  Verification  Technique:  While  prior  approaches  in¬ 
stantiate  expensive  information  flow  tracking  logic,  we 
aim  to  verify  the  concrete  hardware  implementation 
and  the  partial  software  specification  together  to  be 
completely  free  of  undesirable  information  flows.  To 
this  end,  we  propose  a  static  analysis  technique  that 
represents  all  unknown  values  with  the  abstract  value  * 
and  replaces  all  hardware  with  logic  capable  of  operat¬ 
ing  directly  on  >i<.  This  technique,  *-logic  (Star-Logic), 
then  computes  on  *  and  security  labels  to  effectively 
verify  information  flows  through  all  possible  executions 
arising  from  the  unknown  values. 

Understanding  our  approach  to  the  problem  can  be  dif¬ 
ficult  at  first  because  our  design  method  and  verification 
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Figure  2:  Implementing  caches:  The  processor¬ 
generated  memory  address  is  first  masked  off  used 
the  memory  bounds  register.  This  creates  a  physi¬ 
cal  address  that  is  within  the  currently  active  exe¬ 
cution  lease  bounds.  Most  significant  bits  of  this  ad¬ 
dress  are  then  further  masked  off  using  the  trusted 
partition  ID  register  to  generate  the  address  for 
the  cache.  As  a  result,  information  flow  from  a 
potentially  untrusted  memory  access  is  limited  to 
the  currently  enabled  portion  of  the  cache.  Since 
trusted  bits  from  the  partition  ID  register  control 
the  MUXes,  information  flow  control  can  be  veri¬ 
fied  precisely. 


method  are  intimately  linked.  The  verification  method 
works  because  our  designs  exhibit  incredibly  tight 
control  over  the  flow  of  information,  and  our  design 
method  is  useful  because  designs  can  be  verified  eas¬ 
ily-  Later,  in  Section  4,  we  will  describe  how  we  verified  a 
specific  incarnation  of  our  system,  but  for  now  we  can  think 
at  a  high  level  about  the  two  primary  methods  of  managing 
the  information  flow  in  our  skeleton. 

The  first  method  for  information  flow  control  is  to  ensure 
that  certain  critical  portions  of  the  machine  and  kernel  are 
always  kept  with  high  integrity,  i.e.  trusted.  If  we  can  verify 
that  the  system  will  never  breach  this  invariant,  then  these 
critical  bits  can  be  used  as  the  root  of  trust  for  the  rest  of 
system. 

The  second  method  is  carefully  time-multiplexing  the  rest 
of  the  state  between  multiple  security  levels.  There  are  two 
parts  to  this  second  method.  Because  these  time  multiplexed 
bits,  for  example  the  hardware’s  program  counter  or  the 
kernel’s  current  process  ID,  will  change  labels  over  time, 
we  must  bound  the  effect  that  these  bits  can  have  on  the 
system.  Then,  after  we  have  finished  a  unit  of  time,  we  must 
always  be  certain  to  “clean-up”  any  of  these  bits  remaining 
in  multiplexed  parts  of  the  system  (as  controlled  by  trusted 
bits) . 

In  short,  our  minimal  skeleton  working  with  the  separa¬ 
tion  kernel  ensures  that  “trusted  bits  stay  trusted”  and  that 
“untrusted  bits  always  get  cleaned  up”.  Both  these  methods 
can  then  be  verified  to  be  implemented  correctly  through  a 
gate- level  analysis.  Cleaning  up  untrusted  bits  in  a  verifi¬ 
able  manner  is  best  understood  using  a  multiplexer  (MUX): 
if  the  select  input  to  a  MUX  is  trusted,  and  it  selects  a 
trusted  value,  the  result  can  be  trusted  no  matter  what  that 
actual  value  is,  even  when  the  other  input  is  untrusted.  A 
MUX  can  thus  be  thought  of  as  a  gatekeeper  for  trust,  and  is 


Figure  3:  Secure  Pipelines:  The  state  machine 
shows  the  Program  Counter  update  logic  for  a 
pipelined  CPU.  From  all  untrusted  states  (in  gray), 
there  is  a  transition  to  a  trusted  state  that  is  trig¬ 
gered  by  a  trusted  timer,  and  hence  the  PC  is  always 
reset  verifiably.  The  pipeline  flush  requires  4  cycles 
since  our  prototype  CPU  has  a  4-stage  pipeline.  The 
dashed  line  from  start  to  memory  stall  is  to  indicate 
that  while  a  memory  stall  is  possible,  kernel  code 
ensures  that  no  memory  access  misses  in  the  cache. 


used  to  implement  logic  that  resets  critical  system  state  to  a 
trusted  value  or  masks  out  untrusted  signals.  Together,  both 
these  methods  allow,  for  example,  the  kernel  bits  that  store 
the  partition  schedule  to  always  stay  trusted  and  control 
the  MUXes  to  reset  the  program  counter  from  an  untrusted 
state. 

In  the  rest  of  this  section  we  will  show  how  the  combina¬ 
tion  of  bit-level  isolation  and  trusted  time-multiplexing  al¬ 
lows  us  to  first  implement  a  skeleton  that  addresses  the  four 
requirements  listed  above  and  then  formally  verify  that  the 
entire  system  conforms  to  a  desired  information-flow  policy. 

3.1  CPU:  Using  Caches,  Pipelines,  and  Other 
Micro-architectural  Structures 

Caches.  Side-channels  through  caches  have  been  shown 
to  leak  information  about  private  keys  [7,  32].  These  at¬ 
tacks  exploit  the  ability  of  an  attacker  to  learn  the  memory 
accesses  of  a  secret  process  by  flrst  Ailing  the  cache,  yield¬ 
ing  to  the  secret  process,  and  then  inspecting  which  of  its 
own  memory  accesses  miss  in  the  cache.  The  fundamental 
problem  is  that  the  cache  controller  uses  both  secret  and 
unclassifled  information  to  decide  which  cache  lines  to  evict. 
Counter-measures  to  this  attack  include  pinning  secret  lines 
in  the  cache  [32]  (which  was  flrst  shown  to  be  vulnerable  [20] 
and  then  fixed  [21]),  and  proposed  secure  caches  may  still 
be  vulnerable  to  collision-based  timing-driven  attacks  [20]. 

To  implement  an  information  flow  secure  cache,  we  design 
the  cache  controller  to  only  use  trusted  values  to  control  the 
cache  contents:  i.e.  by  partitioning  the  cache  or  by  clear¬ 
ing  the  cache  based  on  trusted  parameters  after  untrusted 
or  secret  code  has  finished  executing.  This  ensures  that  un¬ 
trusted  information  is  confined  to  its  partition,  while  trusted 
information  can  flow  to  untrusted  partitions  (or  unclassified 
to  secret). 

To  implement  partitioned  caches  that  are  verifiably  iso¬ 
lated  at  the  bit-level,  we  allow  only  power-of-2  aligned  cache 
partitions  that  are  configured  by  the  kernel  through  a  par¬ 


tition  ID  register  (as  shown  in  Figure  2).  The  partition  ID 
register  represents  currently  enabled  cache  partition(s)  and 
uses  two  bits  for  each  bit  of  the  cache  address.  Of  these  two 
bits,  if  the  MSB  is  1  the  cache  address  uses  the  processor¬ 
generated  memory  address  else  the  LSB  of  the  partition  ID 
register  is  used.  For  example,  for  a  4b  cache  address,  a  parti¬ 
tion  ID  of  00_00_11_11  implies  that  the  two  most  significant 
bits  of  the  cache  address  will  be  00,  and  the  two  lower  bits 
will  be  used  from  the  address  generated  by  the  processor. 
The  partition  ID  register  set  to  all  Is  will  indicate  that  the 
entire  cache  is  available  for  use.  The  kernel  sets  the  parti¬ 
tion  ID  register  before  jumping  to  untrusted  code  using  the 
instruction  set_partitionID  immediate.  The  cache  con¬ 
troller  also  communicates  with  a  memory  controller  in  case 
of  a  read-miss  or  write-evict,  and  squashes  an  outstanding 
memory  request  when  the  execution  time  slot  for  some  un¬ 
trusted  code  ends. 

This  cache  controller  logic  can  be  verified  to  be  secure  at 
the  bit  level  because  the  MUXes  that  select  the  final  cache 
address  are  controlled  by  the  trusted  partition  ID  register 
(Figure  2).  While  prior  work  has  proposed  that  unclassified 
code  pre-load  AES  tables  into  locked  cache  lines  or  clear 
the  entire  cache  after  secret  execution  [21],  we  are  able  to 
implement  the  mechanism  and  verify  it  at  the  gate-level. 

Pipelining.  Pipelines  are  challenging  to  implement  in  an 
information  flow  secure  manner  since  they  introduce  unpre¬ 
dictable  dynamic  behavior  through  memory  stalls,  branch 
prediction,  register  forwarding  etc.  Our  key  insight  is  that 
such  dynamic  behavior  can  be  allowed  as  long  as  untrusted 
programs’  effects  do  not  spill  over  into  trusted  space  and 
time  slots. 

Figure  3  shows  the  state  machine  that  controls  the  pro¬ 
gram  counter  (PC)  in  our  CPU.  The  state  machine  begins  in 
a  start  state  where  no  PC  lease  is  currently  on  and  trusted 
kernel  code  is  expected  to  execute.  It  can  set  a  timer  and 
transition  to  the  lease_on  state.  This  state  is  typically 
used  to  run  untrusted  or  secret  programs,  but  the  transi¬ 
tion  to  this  state  is  based  on  a  trusted  jump  instruction. 
On  a  cache  miss,  the  state  machine  can  transition  to  mem- 
ory_stall  state.  From  an  untrusted  instruction,  this  tran¬ 
sition  will  be  untrusted  and  the  memory  stall  state  will  be 
untrusted  too.  When  the  PC  timer  expires,  however,  both 
lease_on  and  memory_stall  states  transition  to  a  sequence 
of  4  states,  one  for  each  stage  of  the  pipeline,  where  the  PC  is 
first  restored  to  a  trusted  value  (and  then  incremented  each 
cycle).  The  logic  to  implement  this  sequence  is  hardwired 
instead  of  using  the  jump  instruction  because  general  pur¬ 
pose  registers  may  themselves  be  untrusted  at  the  end  of  a 
lease  to  untrusted  code.  Since  the  restore  PC  is  stored  in  the 
trusted  lease  unit,  computing  the  next  PC  from  this  main¬ 
tains  the  PC  as  trusted.  Further,  since  the  lease  timer  has 
expired,  no  instructions  are  committed  during  this  sequence 
of  states. 

Other  micro-architectural  features  can  also  be  employed 
safely  through  a  combination  of  trusted  partitioning  and 
time-multiplexing.  While  we  do  not  implement  a  branch 
predictor  or  prefetcher,  implementing  these  would  require 
their  state  to  be  either  partitioned  using  trusted  parameters 
or  flushed  at  the  end  of  every  security  context  by  the  kernel. 

3.2  Micro-Kernel:  Context  Switches,  Schedul¬ 
ing  and  Communication 

A  kernel  partition  encapsulates  all  computational  resources 
required  by  a  security  level,  comprising  of  time,  memory,  and 


optionally  I/O  interfaces.  A  portion  of  instruction  and  data 
memory  are  reserved  for  each  security  level,  and  when  a  par¬ 
tition  is  actually  scheduled  to  run,  it  gains  control  over  part 
of  the  machine  such  as  execution  units  and  register  files  for  a 
trusted  amount  of  time.  To  prevent  information  leaks  to  un¬ 
trusted  programs,  kernel  parameters  such  as  time  and  mem¬ 
ory  slots  allocated  to  each  partition  and  the  overall  number 
of  partitions  depend  only  on  trusted  constants  assigned  at 
boot  time.  As  a  result,  the  kernel  scheduler  implements  a 
statically  determined  schedule  which  can  act  as  a  coarse- 
grain  first  level  scheduler,  while  each  partition  implements  a 
second-level  scheduler  to  optimize  performance  within  their 
own  time  bounds  (as  proposed  before  for  real-time  [24]  and 
highly  secure  [18]  systems). 

Precise  Context  Switches:  The  kernel  ensures  con¬ 
tinuous  unbounded  operation  by  saving  and  restoring  user 
programs’  state  on  every  context-switch.  To  support  precise 
control  over  timing  in  the  presence  of  caches  and  pipelines, 
we  introduce  two  unique  features  in  our  micro-kernel.  The 
kernel  explicitly  manages  all  micro-architectural  state  in  the 
processor,  e.g.  through  the  partition  ID  register  to  enforce 
cache  partitions,  and  has  perfectly  imperturbable  execution 
time  for  each  kernel  function,  e.g.  by  never  having  a  memory 
access  miss  in  the  cache. 

To  demonstrate  these  ideas,  we  step  through  the  con¬ 
text  switch  routine  that  is  triggered  each  time  a  set  timer 
expires.  After  a  pipeline-length  delay  to  flush  the  entire 
pipeline  state,  the  first  kernel  instruction  to  commit  is  a 
set_partitionID  immediate.  This  activates  the  kernel  par¬ 
tition  which  stores  complete  context  information  for  all  par¬ 
titions.  The  kernel  then  stores  general  purpose  registers 
in  trusted  addresses  specifically  earmarked  for  the  parti¬ 
tion’s  context.  This  is  possible  since  the  number  of  par¬ 
titions  is  a  trusted  kernel  parameter.  The  kernel  also 
stores  the  last  PC  that  entered  the  commit  stage  when  the 
timer  expired,  accessing  it  through  the  (last_PC  Memfreg]) 
instruction.  Once  the  current  context  is  saved,  the  ker¬ 
nel  loads  the  general  purpose  registers  for  the  new  par¬ 
tition.  It  then  sets  the  cache  partition  available  for  the 
next  partition  (using  set_partition),  sets  memory  bounds 
(set_membounds  global/local  and  finally  sets  a  time  limit 
for  the  new  partition  to  execute  (set_timer).  The  mode 
argument  to  set_timer  indicates  whether  the  new  context 
will  execute  in  kernel  or  user  mode  (since  the  partitionID 
register  can  only  be  set  in  kernel  mode).  Finally,  once  the 
new  partition’s  context  is  loaded  and  the  bounds  are  set,  the 
kernel  loads  the  user-space  PC  and  jumps  to  it. 

Since  the  kernel  has  its  own  reserved  partition  in  the 
cache,  each  of  its  memory  accesses  is  a  cache  hit,  and  be¬ 
cause  there  are  no  branches  in  its  code,  the  kernel  never 
has  a  pipeline  stall.  Thus  context  switches  always  complete 
within  a  fixed  execution  time.  Further,  the  kernel  explicitly 
controls  the  state  of  all  micro-architectural  features,  using 
partitionID  register  for  the  cache,  and  relying  on  the  pro¬ 
cessor  skeleton  to  reset  the  state  of  the  pipeline,  memory 
controller  and  the  I/O  bus-controller  when  a  lease  timer  ex¬ 
pires.  Finally,  while  not  implemented  in  our  prototype,  the 
kernel  can  optionally  save  and  restore  the  entries  on  the  lease 
stack.  Saving  the  lease  stack  for  each  partition  allows  sched¬ 
ulers  that  run  inside  partitions  to  use  different  scheduling 
granularities  than  that  of  the  kernel,  without  being  aware  of 
the  time  bounds  that  they  themselves  run  within. 

Fine-grained  Scheduling.  We  propose  a  novel  hard¬ 
ware  stack  implementation  that  allows  execution  time  and 


memory  bounds  to  have  arbitrary  durations  and  yet  is  ver- 
ifiably  secure  at  the  bit  level.  The  time  and  space  bounds 
for  each  partition  are  stored  in  a  hardware  stack  in  the  CPU 
skeleton.  This  stack  has  to  verifiably  ensure  that  code  in¬ 
side  a  partition  cannot  over-write  its  own  lease  bounds,  i.e. 
the  current  stack  pointer  should  never  affect  any  stack  entry 
below  the  top  of  stack.  The  implementation  in  Execution 
Leases  [30]  protects  lower  stack  entries  by  bit-encoding  the 
timer  values  (e.g.  each  bit  represents  32  cycles),  and  con¬ 
structing  each  timer  entry  to  have  less  time  units  than  the 
previous  entry.  This  restricts  scheduling  granularities  to  be 
aligned  with  the  bit-encoding  and  introduces  artificial  per¬ 
formance  overheads. 

We  provide  a  more  flexible  stack  implementation  by  en¬ 
coding  the  safety  property  in  the  logic  for  the  stack  pointer 
instead  of  the  timer  values  (Figure  4).  By  choosing  the  stack 
pointer  as  the  critical  state,  we  free  up  the  timers  themselves 
to  be  assigned  arbitrary  values.  We  encode  the  stack  pointer 
so  that  each  bit  corresponds  to  a  stack  entry,  and  on  every 
clock  cycle,  the  value  assigned  to  each  bit  of  the  stack  pointer 
is  predicated  upon  all  the  lower  stack  pointer  bits  being  true; 
i.e.  a  stack  entry  is  valid  only  if  all  timers  lower  than  itself 
are  still  active.  Thus  while  information  clearly  flows  from 
lower  stack  pointer  bits  to  higher  ones,  higher  bits  never  af¬ 
fect  the  lower  ones.  Our  two  information  flow  control  tech¬ 
niques  can  be  observed  here:  while  timer[0]  is  permanently 
trusted,  the  rest  of  the  stack  is  time-multiplexed  between 
both  security  levels. 

We  modified  the  set_timer  instruction  to  receive  arbi¬ 
trary  values  as  the  time  limit,  and  added  a  mode  bit  that  a 
caller  can  use  to  specify  whether  the  callee  executes  in  kernel 
or  user  mode.  This  bit  is  set  to  0  by  the  kernel  when  it  sched¬ 
ules  a  user-space  code  in  order  to  protect  the  partition_ID 
register.  Our  CPU  implementation  has  three  separate  lease 
stacks  of  2  entries  each,  one  for  execution  time  (PC)  and  one 
each  for  local  and  global  memory  bounds. 

Read/ Write  Protection:  Information  flow  policies  such 
as  non-interference  [14],  Biba  [10],  and  Bell  &  La  Padula  [9] 
allow  information  to  flow  along  one  direction  in  a  security 
lattice,  e.g  trusted  parameters  can  be  read  but  not  tampered 
with.  In  order  to  support  security  policies  that  allow  uni¬ 
directional  information  flow,  we  modify  the  lease  implemen¬ 
tation  to  support  read-only,  write-only,  or  read- write  control 
over  memory  bounds.  In  contrast,  the  original  implemen¬ 
tation  of  Leases  allowed  complete  access  to  memory  regions 
that  are  specified  as  part  of  the  lease  and  could  only  support 
isolation  in  memory.  The  kernel  uses  the  instruction  set- 
membound  global/local,  timer,  addr_range,  RO/WO/RW to 
specify  whether  the  memory  region  is  Read-Only,  Write- 
Only,  or  Read- Write,  and  the  memory  controller  enforces 
these  permissions  at  run-time. 

To  enforce  isolation,  we  specify  non-overlapping  memory 
bounds  for  the  partitions^.  To  implement  communication, 
one  option  is  for  two  communicating  partitions  to  share  a 
memory  region.  This  requires  the  partitions  to  be  initialized 
so  that  communicating  partitions  are  adjacent.  Since  our 
CPU  implements  two  distinct  memory  regions  per  partition, 
one  partition  can  share  memory  areas  with  a  maximum  of 
four  other  partitions  to  facilitate  zero-copy  communication. 


^Embedded  systems  typically  operate  exclusively  on  physi¬ 
cal  memory,  eliminating  channels  associated  with  both  alias¬ 
ing  and  dynamic  allocation 
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Figure  4:  Implementing  flexible  timers  with  bit- 
level  isolation.  Isolation:  Untrusted  code  (in  gray) 
does  not  taint  the  lowest  stack  entry  (1),  and  when 
the  timer  expires,  the  current  timer  value  becomes 
trusted  (path  2-3-4-6).  A  timer,  when  it  expires, 
resets  all  timers  above  itself  (5)  and  thus  enforces 
nested  leases.  Flexibility:  Instead  of  timer  values, 
the  stack  pointer  is  encoded  to  ensure  bit-level  iso¬ 
lation.  Hence  the  actual  timers  can  be  assigned  ar¬ 
bitrary  values. 


For  more  complex  communication  patterns,  the  microkernel 
has  to  move  data  into  partition  “inboxes”. 

3.3  I/O:  Using  Off-The-Shelf  Protocols  and 
Devices  Securely 

We  complete  our  embedded  system  by  implementing  an 
I/O  protocol  to  connect  the  CPU  and  separation  kernel  sys¬ 
tem  to  peripheral  devices.  I^C  is  a  serial  two- wire  bus  proto¬ 
col  that  is  commonly  used  in  many  embedded  systems,  for 
e.g.  to  configure  RF  tuners,  video  decoders  and  encoders 
and  audio  processors.  It  is  also  present  on  chipsets  designed 
by  Philips,  National  Semiconductor,  Xicor,  Siemens,  and 
many  others  [17].  We  show,  for  the  first  time,  that  it  is  pos¬ 
sible  to  implement  a  provably  secure  I^C  master  controller 
that  can  then  interface  with  commodity  I^C  devices. 

Implicit  information  leaks  occur  when,  for  example,  the 
I^C  bus  master  hrst  communicates  with  an  untrusted  slave, 
and  then  with  a  trusted  slave.  The  master’s  current  state 
will  first  depend  on  information  received  from  the  untrusted 
slave,  and  appear  as  a  covert  channel  underneath  the  ISA 
to  software,  where  the  untrusted  slave  affects  the  timing  of 
trusted  communication.  At  the  gate-level,  this  implicit  flow 
takes  the  form  of  an  explicit  ACK  message  from  the  un¬ 
trusted  slave  to  the  trusted  master’s  state  machine,  causing 
the  master’s  state  machine  to  be  labeled  untrusted.  Thus 
even  if  the  bus  master  seems  to  be  behaving  “correctly”  and 
the  devices  are  not  snooping  on  the  bus,  there  are  still  in¬ 
formation  flows  between  devices  on  the  bus. 

Trusted  Bus  Adapter:  To  restrict  these  implicit  flows, 
we  propose  to  overlay  a  time  division  multiplexed  (TDMA) 
schedule  over  an  I^C  bus,  and  introduce  adapters  to  con¬ 
nect  external  devices  to  the  shared  system  bus  (Figure  1). 
Each  adapter’s  time  slot  is  a  trusted  kernel  parameter,  and 
the  adapters  enforce  that  for  each  slot  only  the  currently 
addressed  device  has  access  to  the  bus  while  the  remain- 
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Figure  5:  Our  toolchain  for  verifying  information 
flow  properties  of  embedded  systems.  Once  the  de¬ 
sign  has  been  debugged  using  conventional  tools,  our 
abstraction  and  augmentation  tools  create  a  new  de¬ 
sign  that  operates  on  security  labels  and  unknown 
values  in  addition  to  traditional  digital  values.  This 
augmented  design  can  then  be  simulated  using  stan¬ 
dard  hardware  simulators  to  generate  output  labels. 
These  are  then  compared  with  labels  specified  by 
the  desired  information  flow  policy  to  determine  if 
the  design  conforms  to  the  policy. 

ing  adapters  disconnect  their  corresponding  slave  devices. 
When  a  lease  timer  expires,  the  bus  master  state  machine 
is  reset  to  a  trusted  state  to  eliminate  the  implicit  flows 
mentioned  earlier.  We  implement  the  adapters  to  not  only 
impose  a  TDMA  schednle  on  the  bus  but  also  conform  to 
the  I^C  specihcation.  The  adapters  do  so  by  driving  the 
clock  signal  to  a  device  low  when  its  slot  has  expired,  and 
since  the  I^C  protocol  doesn’t  rely  on  wall-clock  time,  the 
devices  hold  on  to  data  until  the  I^C  clock  goes  high  again. 
As  a  result,  we  can  use  unmodified  P  C- compliant  devices  in 
our  I/O  subsystem. 

The  CPU  -  I/O  Interface:  The  I^C  Bus  Master  state 
machine  is  implemented  in  hardware  as  part  of  the  CPU. 
Programs  in  each  partition  (device  drivers  local  to  each 
partition)  use  two  instructions  in  dest_reg,  dev_addr  and 
out  dev_addr,  src_reg  to  transfer  a  32b  value  between 
a  register  and  an  I^C  device  that  the  partition  is  allowed 
to  address.  Since  peripheral  devices  are  typically  slower 
than  a  CPU,  a  device  driver  can  use  the  instruction  i2c_on 
dest_reg  to  record  into  dest_reg  whether  the  I^C  bus  is 
currently  in  the  middle  of  a  transmission. 

We  have  described  a  system  that  manages  the  flow  of 
unknown  and  untrusted  bits  such  that  arbitrary  untrusted 
computation  is  tightly  bounded  by  trusted  bits.  The  chal¬ 
lenge  now  is  to  verify  that  an  implementation  of  the  high- 
level  description  correctly  enforces  an  information  flow  pol¬ 
icy. 

3.4  Verification:  Removing  Analysis  Logic 
Overhead 

Given  a  system  in  which  the  hardware  and  software  co¬ 
operate  to  enforce  a  specihc  policy,  we  want  to  verify  that 
the  measures  taken  by  a  small,  known  portion  of  the  soft- 
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Figure  6:  Given  a  concrete  design  to  be  verified  and  a  specification  of  system  state  and  inputs  whose  values 
are  unknown,  Step  1  of  our  technique  automatically  generates  an  abstract  representation  of  the  design  that 
covers  all  possible  concrete  executions.  In  Step  2,  the  technique  augments  the  abstract  system  with  the 
state  and  logic  to  track  the  flow  of  security  labels.  The  resulting  augmented  system  is  then  synthesized  and 
simulated  using  hardware  design  tools  such  as  ModelSim  and  Altera  Quartus.  Augmented  logic  uses  security 
labels  specified  by  the  user  for  inputs  and  memory  bits  and  generates  output  labels  that  are  compared  with 
the  desired  output  labels  as  specified  by  the  information  flow  policy. 


ware  and  hardware  will  be  sufficient  to  prevent  unauthorized 
labels  from  ever  appearing  at  certain  memory  locations  and 
output  ports.  This  policy  of  non-interference  prevents  un¬ 
trusted  information  from  leaking  to  trusted  ports,  but  allows 
trusted  information  to  appear  on  untrusted  output  ports  (or 
memory  locations). 

Our  static  analysis  technique,  4<-logic  (star-logic),  works 
on  a  given  hardware-software  design  in  three  steps:  first  it 
creates  an  abstract  design  that  soundly  estimates  all  possible 
executions  of  the  given  design,  then  it  augments  the  abstract 
design  with  information  flow  tracking  logic,  and  finally  sim¬ 
ulates  this  augmented  design  until  all  possible  states  of  the 
abstract  design  are  explored.  Figure  5  shows  how  these  three 
steps  fit  into  the  traditional  hardware  design  flow,  and  Fig¬ 
ure  6  presents  the  individual  steps  in  more  detail. 

Step  1:  Abstraction.  The  designer  implements  a  sys¬ 
tem’s  trusted  computing  base  and  specifies  the  rest  of  the 
system  as  unknown  (represented  as  *),  where  the  unknown 
part  represents  the  contents  of  memory  for  all  the  secret 
and  unclassified  processes  and  all  the  input  ports.  The  *- 
logic  tool  takes  the  concrete  implementation  and  a  speci¬ 
fication  of  unknown  parts  to  generate  an  abstract  system 
that  operates  on  both  known  and  unknown  values.  An 
abstract  2-input  AND  gate  is  thus  a  logic  function  from 
{0, 1,  *}  X  {0,  1,  *}  — F  {0,  1,  >!<},  and  evaluates  its  inputs  to 
decide  whether  the  output  is  a  0,  1,  or  unknown.  This  tool 
builds  upon  the  observation  in  GLIFT  that  if  one  input  of  a 
2-input  AND  gate  is  a  0,  the  output  is  a  0  and  the  other  in¬ 
put  is  ineffective  even  if  it  is  unknown  (*).  For  other  input 
cases,  the  unknown  input  will  cause  the  output  to  be  un¬ 
known.  The  purpose  of  this  abstract  system  is  to  track  the 
flow  of  unknown  values  through  the  system,  and  in  doing 
so  strictly  over-approximate  all  possible  executions  of  the 
partially  specified  system.  As  a  result,  simulating  the  ab¬ 
stract  design  will  mark  as  unknown  all  state  that  can  ever 
be  affected  by  the  unknown  inputs.  This  soundness  prop¬ 
erty  can  be  proven  formally  using  the  Abstract  Interpreta¬ 
tion  framework  [27],  but  the  formal  proof  is  out  of  scope  of 
this  manuscript. 

Step  2:  Augmentation.  The  augmentation  step  takes 
two  inputs  as  well:  the  abstract  design  from  the  first  step. 


and  an  information  flow  lattice  (such  as  trusted  C  untrusted) 
that  specifies  a  set  of  labels  and  implies  rules  for  how  they 
they  are  to  be  propagated.  This  second  tool  then  convolves 
the  lattice  with  the  abstract  design  to  create  a  new  design 
that  operates  on  both  abstract  values  and  labels,  effectively 
propagating  whether  each  bit  is  unknown  and/or  untrusted. 
Using  the  truth  table  for  an  AND  gate,  we  can  construct 
an  augmented  truth  table  where  the  inputs  can  assume  a 
value  that  is  one  of  {0^,  1^,  0*^,  0^,  (encoded  re¬ 

spectively  as  {000,001,010,011,110,111}),  and  the  output 
is  computed  to  be  one  of  the  same.  The  first  bit  in  the  new 
tuple  of  values  that  the  gate  will  operate  on  is  0  if  the  original 
value  is  concrete  or  1  if  the  value  is  unknown.  The  second  bit 
is  the  actual  value  if  known  and  1  otherwise.  The  final  bit  is 
0  for  U  and  1  for  T.  Thus  an  n-input  m-output  digital  gate 
is  replaced  by  a  gate  with  3n  inputs  and  produces  3m  out¬ 
puts.  These  augmented  gates  are  then  interconnected  just 
as  original  gates  of  the  system  were.  The  augmented  logic, 
again  being  a  hardware  design  itself,  can  then  be  simulated 
using  existing  hardware  synthesis  and  simulation  tools. 

Step  3:  Simulation.  The  final  step  in  the  process  is 
to  simulate  the  resulting  augmented  system  to  check  that 
it  conforms  to  a  specified  information  flow  policy  for  ev¬ 
ery  possible  state  of  the  augmented  machine.  Covering  the 
set  of  all  possible  logic  states,  is  made  tractable  by  the  ab¬ 
straction  from  step  1,  and  because  the  concretely  specified 
system  that  comprises  the  trusted  computing  base  only  has 
a  practically  enumerable  number  of  states  (a  scenario  very 
common  to  high  assurance  systems).  To  specify  information 
flow  policies  a  designer  has  to  initialize  the  system  by  as¬ 
signing  security  labels  for  each  input,  output,  and  memory 
location.  During  simulation,  if  an  output  or  some  state  bit 
of  the  system  is  found  to  have  an  illegal  label  (e.g.  a  trusted 
output  port  has  an  untrusted  label),  then  the  system  under 
test  has  to  be  modified  for  correct  information  flow  control. 
In  this  way  dataflow  assertions  (such  as  “no  classified  infor¬ 
mation  should  egress  the  system  via  port  2”)  can  be  checked 
as  standard  logic  assertions  (“the  label  bit  for  out-port  2 
should  never  be  set  to  True”). 


4.  RESULTS 

Figure  5  shows  our  toolchain  to  analyze  hardware  designs 
written  in  behavioral  Verilog  or  VHDL  (so  that  hardware 
designers  can  use  their  tools  of  choice  for  design  entry). 
Verilog/VHDL  designs  are  synthesized  using  Synopsys  De¬ 
sign  Compiler  into  a  gate- level  netlist  using  the  and_or.db 
library.  The  result  of  this  synthesis  is  a  netlist  that  con¬ 
sists  of  just  AND,  OR,  and  NOT  gates  along  with  registers 
and  memory.  This  netlist  is  input  to  our  abstraction  tool, 
which  replaces  gates  and  bits  of  the  netlist  with  their  ab¬ 
stract  counterparts  and  outputs  the  abstract  netlist.  The 
abstract  netlist  is  then  input  to  our  augmentation  tool  that 
generates  information  flow  tracking  logic  for  the  abstract 
design  to  create  the  final  netlist.  Finally,  the  augmented 
design  is  simulated  using  hardware  synthesis  and  simulation 
tools  such  as  Altera  Quartus. 

4.1  CPU  Implementation 

This  section  presents  implementation  details  of  our  CPU 
[Star- CPU)  and  compares  its  functionality  and  area-delay 
with  prior  work.  We  implemented  the  Star-CPU  in  Verilog, 
generated  a  gate-level  netlist  using  Synopsys  Design  Com¬ 
piler,  and  synthesized  this  using  QuartusII  v9.1  with  Altera 
EP2S15F48C43  FPGA  as  the  target  device.  The  Star-CPU 
pipeline  is  single-issue,  executes  in-order,  and  has  4  stages 
(fetch,  decode,  execute,  and  commit /write-back).  It  has  8 
general  purpose  registers,  a  mode  bit  to  indicate  kernel/user 
mode,  and  a  partition  ID  register  to  record  the  current  secu¬ 
rity  context.  The  memory  hierarchy  includes  a  2kB  direct- 
mapped  data  cache,  and  64kB  each  of  instruction  and  data 
memory.  The  data  cache  is  implemented  on  the  FPGA  us¬ 
ing  comparator  logic  and  registers  and  requires  one  cycle  if 
a  memory  access  is  a  hit,  while  the  memory  is  implemented 
using  on-chip  block  RAMs  that  take  two  cycles  to  service 
a  memory  request.  To  emulate  memory  access  latency  in 
an  ASIG  implementation  of  the  system,  the  memory  con¬ 
troller  is  implemented  to  introduce  an  additional  delay  of 
100  cycles.  Without  micro-architectural  features  such  as 
branch  predictors,  TLBs,  and  Out-of-Order  execution,  the 
Star-CPU  pipeline  stalls  on  each  cache  miss  and  requires 
the  compiler  to  ensure  that  a  register  used  in  a  conditional 
jump  instruction  has  the  desired  value  at  least  4  instructions 
before  it  is  used. 

Area-Delay  Comparison.  Figure  8  quantifies  the  size 
and  performance  advantages  of  the  Star-CPU  against  the 
Execution  Lease  CPU  and  against  the  Star-CPU  with  dy¬ 
namic  CLIFT  logic  (Star-GLIFT).  The  Star-CPU  provides 
caches,  pipelining,  and  kernel  support  beyond  the  Lease 
CPU  in  equivalent  area  and  clock-frequency,  and  provides 
static  security  guarantees  compared  to  Star-GLIFT  in  al¬ 
most  1/4  the  logic,  1/2  the  memory,  and  2X  the  clock- 
frequency. 

The  Star-CPU’s  base  functionality  is  implemented  in  5756 
ALUTs  (Adaptive  Look-Up  Tables  in  an  Altera  FPGA,  where 
1  ALUT  corresponds  very  approximately  to  9-12  gates),  and 
while  the  base  functionality  in  the  Lease  CPU  requires  only 
1511  ALUTs,  it  requires  5040  ALUTs  when  the  dynamic 
analysis  logic  is  factored  in  [30].  Thus  the  Star-CPU  re¬ 
places  analysis  logic  overhead  with  a  cache  and  pipeline 
logic.  In  terms  of  performance,  the  Star-CPU  and  Lease 
CPU  have  similar  frequencies  (99  MHz  vs.  104MHz),  but 
the  unpipelined  Lease  CPU  only  commits  one  instruction 
every  5  cycles.  Further,  without  a  cache,  every  memory  ac¬ 
cess  in  the  Lease  CPU  goes  to  main  memory. 


Instruction 

Description 

set_timer  Rl,  R2,  R3 

Set  PC  lease.  Arguments  R#:  register  or  immediate. 

Rl:  timer,  R2:  restore  PC,  R3:  kernel/user  mode 

set_membound  Rl,  R2,  R3 

Set  local  or  global  memory  bounds. 

Rl:  memory  range,  R2:  timer,  R3:  read/write  mode 

set_partitionlD  Immediate 

If  mode  ==  kernel,  then  partitionID  =  Immediate 

last_PC  [Rl] 

Mem[Rl]  =  PC  in  commit  stage  when  the  last  timer  expired 

jgtz  /jump  R2,  Rl 

Jump  if  Rl  >=  0  or  unconditionally.  PC  =  R2  or  Memory(R2] 

load/store/mov  R2,  Rl 

Immediate  and  register  direct  addressing  modes 

add,sub,lsh,rsh  R3 

and,or,not,cmplt 

ALU  instrucb'ons.  Register  arguments 

in/out  Rl,  dev_addr 

Read  and  write  to  Pc  transfer  register 

io_on  Rl 

Rl  =  1  if  rC  transaction  is  ongoing 

no-op 

No-op  instruction 

Figure  7:  Figure  shows  the  ISA  for  the  Star-CPU 
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Figure  8:  Area  and  Frequency  comparison  among 
secure  CPUs. 

Comparing  the  verified  Star-CPU  to  Star-GLIFT,  we  ob¬ 
serve  that  the  Star-GLIFT  CPU  requires  23,956  ALUTs  for 
logic  and  2xl33kB  for  state  and  state  labels,  whereas  the 
Star-CPU  only  requires  5756  ALUTs  and  133kB  for  state. 
Adding  dynamic  tracking  logic  for  the  complex  control  logic 
of  the  CPU  introduces  substantial  delays  and  reduces  the 
maximum  operating  frequency  of  the  Star-GLIFT  CPU  to 
55MHz  (from  99MHz  for  the  verified  Star-CPU).  In  sum¬ 
mary,  the  verified  Star-CPU  provides  better  functionality 
than  the  Lease  CPU,  and  static  verification  in  comparison 
to  Star-GLIFT  CPU  with  much  lesser  area  and  delay. 

4.2  Kernel  Implementation 

Our  full-system  prototype  is  representative  of  a  high  as¬ 
surance  avionics  system  [1].  Each  trust  domain  in  the  system 
is  assigned  a  partition,  and  a  micro-kernel  manages  these 
partitions’  access  to  the  CPU,  physical  memory,  and  periph¬ 
eral  devices.  The  kernel  implements  an  ARINC  653  sched¬ 
uler  (a  standard  in  avionics  systems)  which  requires  that  all 
partitions  be  statically  defined  at  compile  time  in  the  form 
of  a  major  frame  period  that  repeats  forever.  Within  a  ma¬ 
jor  frame,  the  schedule  specifies  one  or  more  execution  time 
slots  for  each  partition,  leaving  the  partitions  free  to  imple¬ 
ment  standard,  priority-based  schedulers  within  their  time 
slots. 

Specifically,  our  prototype  instantiates  4  partitions:  the 
first  partition  to  run  programs  responsible  for  controlling 
trusted  avionics  functions,  the  second  partition  for  untrusted 
programs  such  as  passenger  internet  and  non-critical  diag¬ 
nostics,  the  third  partition  for  a  trusted  cross-domain  guard 
responsible  for  one-way  communication  among  the  above 
two  partitions,  and  the  fourth  partition  reserved  for  trusted 
kernel  functions  that  require  hardware  access  such  as  a  con- 


Kernel  Scheduler  1 :  //  uses  reserved  registers  R1  and  R2 


Context  Switch  1:  //from  partition  Oto  1 
//  save  current  state:  10  cycles 
set_partitionlD  KERNELJD; 
store  ctxt_arr  [PID0)(0],  gen_reg  [0]; ... 
store  ctxt_arr  [PID0](7],  gen_reg  [7]; 
last_PC  ctxt_arr  IPID0)(8]; 


//  set  bounds  and  timers  for  partition  1:  10  cycles 
load  Rl,  PIDl_BOUNDS_G;  load  R2,  PID1_TIME_G; 
set  memboundsg  Rl,  R2,  PID1_RW_G; 

load  Rl,  PIDl_BOUNDS_L;  load  R2,  PID1_TIME_L; 
set  memboundsi  Rl,  R2,  PID1_RW_L; 


//  restore  incoming  state;  9  cycies  load  Rl,  PiDl_TiMER;  load  R2,  Context  5witch2_PC: 

load  gen_reg[0],  ctiit_arr  [PID1][0]:  ...  settimer  Rl,  R2,  USR_MODE; 

load  gen_reg  [7],  ctxt_arr  [PID1][7]; 

set_partitionlD  PIDl;  jump  ctxt_arr  [PID1][8]: 

//  returns  to  context  switch2  ()  in  kernel  mode 

Context  switch  time  =  2  x  pipeline  depth  +  context  save  +  context  restore  +  kernel  scheduler  =  37  cycles 


Figure  9:  Kernel  scheduler  and  context  switch  func¬ 
tions  in  assembly.  Security  policies  are  expressed 
through  the  values  of  partition  parameters  for  mem¬ 
ory  and  time  bounds.  The  functions  are  small  since 
the  ISA  and  CPU  are  designed  for  information  flow 
control. 


Figure  10:  Figure  shows  how  to  check  for  a  safe  re¬ 
set  of  the  I^C  adapter  to  a  trusted  state  (based  on  a 
trusted  signal  time_valid).  The  Modelsim  simulation 
waveform  begins  with  the  adapter  in  an  untrusted 
time  slot  (time_valid  =  1).  During  the  untrusted 
communication,  the  adapter’s  state  bit  stays  un¬ 
known  (as  indicated  by  the  MSB  of  ad_state  being  1) 
and  untrusted  (ad_state_shadow  =  1)  until  the  time 
slot  expires  (indicated  by  the  time_valid  signal).  At 
that  point,  the  adapter’s  state  machine  is  reset  to  a 
trusted  state  (indicated  by  the  ad_state_shadow  sig¬ 
nal  going  low). 


text  switch.  To  effect  a  context  switch,  the  kernel  partition 
gets  one  time  slot  after  each  of  the  other  partitions’  slots. 
In  actnal  systems,  the  partitions  are  sized  so  that  they  meet 
hard  real-time  guarantees  demanded  by  critical  application; 
we  opt  for  arbitrary  dnrations  for  each  partition  in  order 
to  demonstrate  how  to  verify  non-interference  between  the 
trusted  and  untrusted  partitions. 

To  verify  non-interference,  we  instantiate  the  trusted  ker¬ 
nel  scheduler  and  the  context  switch  partition  with  concrete 
values,  while  the  other  three  partitions  are  instantiated  as 
unknown  (>i<).  The  cross-domain  guard  partition  overlaps 
a  read-only  memory  region  with  the  trusted  avionics  parti¬ 
tion  and  a  write-only  partition  with  the  untrusted  partition. 
This  ensures  that  information  can  only  flow  in  one  direc¬ 
tion  from  trusted  to  untrusted.  Note,  however,  that  cross¬ 
domain  guards  can  also  be  required  to  impose  restrictions 
on  the  type  of  information  that  can  be  transferred.  Enforc¬ 
ing  such  rules  requires  verifying  the  guard  program  logic 
using  alternative  formal  verification  techniques;  bit-level  in¬ 
formation  flow  analysis  is  too  coarse-grained  to  provide  such 
guarantees. 

The  scheduler  is  small;  the  trusted,  concretely  specified 
scheduler  and  context  switch  code  only  requires  87  assembly 
instructions.  This  is  primarily  because  the  ISA  in  Figure  7  is 
explicitly  designed  for  information  flow  control.  The  time  re¬ 
quired  to  switch  contexts  is  an  important  performance  met¬ 
ric  for  a  kernel.  In  our  system  prototype,  it  takes  the  kernel 
partition  37  cycles  to  switch  one  partition  and  schedule 
another:  4  cycles  each  to  flush  and  re-fill  the  pipeline,  19 
to  save  and  restore  partition  ID,  general  purpose  registers 
and  the  last  executed  PC,  10  to  set  new  memory  and  time 
bounds  and  to  jump  to  the  PC  for  the  restored  context. 

4.3  I/O  Implementation 

The  I^C  devices  and  adapters  are  also  processed  using  the 
verification  flow  mentioned  above  (note  that  the  CPU  and 
kernel  can  be  verified  independently  of  the  I/O  by  treating 
the  I/O  interface  in  the  CPU  as  trusted  and  unknown).  Our 
experiment  use  a  single  master  and  three  slaves  connected 
to  the  bus  using  adapters.  Each  adapter  synthesized  indi¬ 
vidually  requires  49b  of  state  and  the  slave  requires  21b.  We 
wish  to  verify  that  information  does  not  flow  from  the  un¬ 
trusted  slave  to  any  other  device,  and  set  up  a  test  where 
the  master  is  trusted  and  known,  communicates  with  an  un¬ 
trusted  slave  that  is  unknown  (**^).  The  I^C  bus  has  two 
trusted  slaves,  one  specified  and  one  unknown. 


4.4  Verification  Results 

We  simulate  the  augmented  designs  of  the  CPU  and  the 
I/O  system  for  one  loop  of  the  kernel  scheduler,  come  back  to 
the  initial  state,  and  verify  that  all  security  labels  for  mem¬ 
ory  and  outputs  follow  the  desired  policy  for  every  state  of 
the  system.  Figure  10  shows  a  screenshot  of  verifying  that 
the  I^C  adapter  state  is  reset  to  trusted  once  an  untrusted 
time  slot  has  ended.  We  simulate  a  complete  I^C  transac¬ 
tion,  letting  transmitted  data  values  be  unknown,  and  verily 
that  no  untrusted  value  ever  appears  on  the  adapter  outputs 
to  the  trusted  slave  devices.  This  experiment  used  3  slaves 
and  3  adapters  with  a  total  of  184  state  bits,  and  if  we 
assume  that  the  hardware  modules’  contributions  are  pro¬ 
portional  to  their  individual  sizes,  our  technique  can  verify 
the  184b  system  by  specifying  ~128b  concretely  and  eval¬ 
uating  all  combinations  of  the  rest  in  a  single  execution  of 
the  augmented  system. 

The  4<-logic  verification  technique  scales  to  handle  large 
embedded  system  designs,  as  shown  in  Figure  8.  Of  the 
total  133kB  state  for  the  Star-CPU,  only  3264b  are  required 
to  specify  the  micro-kernel’s  scheduler,  context  switch  code, 
and  partition  bounds.  The  verification  scales  because  its 
complexity  grows  linearly  with  the  size  of  the  design  under 
test,  as  each  module  is  replaced  by  its  augmented  module. 

Total  verification  time  includes  the  time  for  both  synthe¬ 
sizing  the  design  and  simulating  it  for  a  kernel  scheduler 
loop.  The  augmented  logic  takes  considerably  longer  to  syn¬ 
thesize  as  compared  to  the  basic  design  under  test.  The  aug¬ 
mented  Star-CPU,  with  48093  ALUTs,  required  14  hours  to 
synthesize  with  QuartusII  v9.1  as  compared  to  just  7  min¬ 
utes  for  the  basic  design  with  5756  ALUTs.  Once  synthe¬ 
sized,  simulating  one  loop  of  the  kernel  scheduler  only  takes 
a  few  seconds.  All  measurements  were  made  on  a  IGHz 
AMD  Athlon  64  X2  Dual  Core  Processor  with  1MB  cache 
and  2.7GB  RAM. 

In  the  future,  we  will  integrate  *-logic  with  other  formal 
techniques  that  can  work  with  richer  abstractions.  While 
such  techniques  do  not  readily  scale  to  large  systems,  these 
can  complement  >i=-logic  to  verify  that  the  system  is  secure  for 
a  set  of  implementations  or  kernel  parameter  values  instead 
of  one  specific  implementation. 

5.  CONCLUSIONS 

Embedded  systems  are  trusted  by  people  to  do  everything 
from  stopping  their  cars  to  controlling  the  beating  of  their 
hearts,  yet  all  too  often  these  systems  compromise  strong 


security  for  rich  functionality.  We  have  shown,  for  the  first 
time,  that  complete  statically  verifiable  information  flow  se¬ 
curity  is  compatible  with  the  convenience  of  continuous,  un¬ 
bounded  operation  and  dynamic  optimizations  -  even  when 
we  consider  timing  channels  and  other  hardware/software 
leaks  as  part  of  our  threat  model.  Our  system  is  designed 
around  an  architectural  skeleton  that  allows  a  micro-kernel 
to  safely  multiplex  mixed-trust  programs  on  the  hardware, 
and  can  do  so  in  1/4*^  the  area  and  with  double  the  clock 
frequency  as  more  restrictive  prior  work.  These  advances 
are  due  in  part  to  the  development  of  a  tool  that  can  stati¬ 
cally  verify  information  flows  through  full  systems  at  the  bit 
level,  allowing  us  to  verify  a  133kB  system  by  specifying  only 
3264b  concretely,  and  leaving  behind  the  hardware  dynamic 
flow  tracking  considered  in  prior  work.  While  more  work  is 
required  to  examine  the  broad  applicability  and  scalability  of 
this  approach,  by  implementing  and  verifying  a  full-system 
prototype  (including  a  CPU,  a  micro-kernel,  and  I^C  based 
I/O)  we  have  demonstrated  that  a  useful  balance  between 
flexibility  and  hardware  information  leakage  is  not  only  pos¬ 
sible  but  can  even  be  relatively  efficient. 
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